558 research outputs found
Local Ranking Problem on the BrowseGraph
The "Local Ranking Problem" (LRP) is related to the computation of a
centrality-like rank on a local graph, where the scores of the nodes could
significantly differ from the ones computed on the global graph. Previous work
has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a
graph where nodes are webpages and edges are browsing transitions. Recently,
this graph has received more and more attention in many different tasks such as
ranking, prediction and recommendation. However, a web-server has only the
browsing traffic performed on its pages (local BrowseGraph) and, as a
consequence, the local computation can lead to estimation errors, which hinders
the increasing number of applications in the state of the art. Also, although
the divergence between the local and global ranks has been measured, the
possibility of estimating such divergence using only local knowledge has been
mainly overlooked. These aspects are of great interest for online service
providers who want to: (i) gauge their ability to correctly assess the
importance of their resources only based on their local knowledge, and (ii)
take into account real user browsing fluxes that better capture the actual user
interest than the static hyperlink network. We study the LRP problem on a
BrowseGraph from a large news provider, considering as subgraphs the
aggregations of browsing traces of users coming from different domains. We show
that the distance between rankings can be accurately predicted based only on
structural information of the local graph, being able to achieve an average
rank correlation as high as 0.8
A framework for space-efficient string kernels
String kernels are typically used to compare genome-scale sequences whose
length makes alignment impractical, yet their computation is based on data
structures that are either space-inefficient, or incur large slowdowns. We show
that a number of exact string kernels, like the -mer kernel, the substrings
kernels, a number of length-weighted kernels, the minimal absent words kernel,
and kernels with Markovian corrections, can all be computed in time and
in bits of space in addition to the input, using just a
data structure on the Burrows-Wheeler transform of the
input strings, which takes time per element in its output. The same
bounds hold for a number of measures of compositional complexity based on
multiple value of , like the -mer profile and the -th order empirical
entropy, and for calibrating the value of using the data
On landmark selection and sampling in high-dimensional data analysis
In recent years, the spectral analysis of appropriately defined kernel
matrices has emerged as a principled way to extract the low-dimensional
structure often prevalent in high-dimensional data. Here we provide an
introduction to spectral methods for linear and nonlinear dimension reduction,
emphasizing ways to overcome the computational limitations currently faced by
practitioners with massive datasets. In particular, a data subsampling or
landmark selection process is often employed to construct a kernel based on
partial information, followed by an approximate spectral analysis termed the
Nystrom extension. We provide a quantitative framework to analyse this
procedure, and use it to demonstrate algorithmic performance bounds on a range
of practical approaches designed to optimize the landmark selection process. We
compare the practical implications of these bounds by way of real-world
examples drawn from the field of computer vision, whereby low-dimensional
manifold structure is shown to emerge from high-dimensional video data streams.Comment: 18 pages, 6 figures, submitted for publicatio
Neuropathology in COVID-19 autopsies is defined by microglial activation and lesions of the white matter with emphasis in cerebellar and brain stem areas
IntroductionThis study aimed to investigate microglial and macrophage activation in 17 patients who died in the context of a COVID-19 infection in 2020 and 2021.MethodsThrough immunohistochemical analysis, the lysosomal marker CD68 was used to detect diffuse parenchymal microglial activity, pronounced perivascular macrophage activation and macrophage clusters. COVID-19 patients were compared to control patients and grouped regarding clinical aspects. Detection of viral proteins was attempted in different regions through multiple commercially available antibodies.ResultsMicroglial and macrophage activation was most pronounced in the white matter with emphasis in brain stem and cerebellar areas. Analysis of lesion patterns yielded no correlation between disease severity and neuropathological changes. Occurrence of macrophage clusters could not be associated with a severe course of disease or preconditions but represent a more advanced stage of microglial and macrophage activation. Severe neuropathological changes in COVID-19 were comparable to severe Influenza. Hypoxic damage was not a confounder to the described neuropathology. The macrophage/microglia reaction was less pronounced in post COVID-19 patients, but detectable i.e. in the brain stem. Commercially available antibodies for detection of SARS-CoV-2 virus material in immunohistochemistry yielded no specific signal over controls.ConclusionThe presented microglial and macrophage activation might be an explanation for the long COVID syndrome
Reproducing Kernels of Generalized Sobolev Spaces via a Green Function Approach with Distributional Operators
In this paper we introduce a generalized Sobolev space by defining a
semi-inner product formulated in terms of a vector distributional operator
consisting of finitely or countably many distributional operators
, which are defined on the dual space of the Schwartz space. The types of
operators we consider include not only differential operators, but also more
general distributional operators such as pseudo-differential operators. We
deduce that a certain appropriate full-space Green function with respect to
now becomes a conditionally positive
definite function. In order to support this claim we ensure that the
distributional adjoint operator of is
well-defined in the distributional sense. Under sufficient conditions, the
native space (reproducing-kernel Hilbert space) associated with the Green
function can be isometrically embedded into or even be isometrically
equivalent to a generalized Sobolev space. As an application, we take linear
combinations of translates of the Green function with possibly added polynomial
terms and construct a multivariate minimum-norm interpolant to data
values sampled from an unknown generalized Sobolev function at data sites
located in some set . We provide several examples, such
as Mat\'ern kernels or Gaussian kernels, that illustrate how many
reproducing-kernel Hilbert spaces of well-known reproducing kernels are
isometrically equivalent to a generalized Sobolev space. These examples further
illustrate how we can rescale the Sobolev spaces by the vector distributional
operator . Introducing the notion of scale as part of the
definition of a generalized Sobolev space may help us to choose the "best"
kernel function for kernel-based approximation methods.Comment: Update version of the publish at Num. Math. closed to Qi Ye's Ph.D.
thesis (\url{http://mypages.iit.edu/~qye3/PhdThesis-2012-AMS-QiYe-IIT.pdf}
Statistical Mechanical Development of a Sparse Bayesian Classifier
The demand for extracting rules from high dimensional real world data is
increasing in various fields. However, the possible redundancy of such data
sometimes makes it difficult to obtain a good generalization ability for novel
samples. To resolve this problem, we provide a scheme that reduces the
effective dimensions of data by pruning redundant components for bicategorical
classification based on the Bayesian framework. First, the potential of the
proposed method is confirmed in ideal situations using the replica method.
Unfortunately, performing the scheme exactly is computationally difficult. So,
we next develop a tractable approximation algorithm, which turns out to offer
nearly optimal performance in ideal cases when the system size is large.
Finally, the efficacy of the developed classifier is experimentally examined
for a real world problem of colon cancer classification, which shows that the
developed method can be practically useful.Comment: 13 pages, 6 figure
Knot selection in sparse Gaussian processes with a variational objective function
Sparse, knot‐based Gaussian processes have enjoyed considerable success as scalable approximations of full Gaussian processes. Certain sparse models can be derived through specific variational approximations to the true posterior, and knots can be selected to minimize the Kullback‐Leibler divergence between the approximate and true posterior. While this has been a successful approach, simultaneous optimization of knots can be slow due to the number of parameters being optimized. Furthermore, there have been few proposed methods for selecting the number of knots, and no experimental results exist in the literature. We propose using a one‐at‐a‐time knot selection algorithm based on Bayesian optimization to select the number and locations of knots. We showcase the competitive performance of this method relative to optimization of knots simultaneously on three benchmark datasets, but at a fraction of the computational cost
Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis
SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures
Applications of Polynomial Chaos-Based Cokriging to Aerodynamic Design Optimization Benchmark Problems
In this work, the polynomial chaos-based Cokriging (PC-Cokriging) is applied to a benchmark aerodynamic design optimization problem. The aim is to perform fast design optimization using this multifidelity metamodel. Multifidelity metamodels use information at multiple levels of fidelity to make accurate and fast predictions. Higher amount of lower fidelity data can provide important information on the trends to a limited amount of high-fidelity (HF) data. The PC-Cokriging metamodel is a multivariate version of the polynomial chaos-based Kriging (PC-Kriging) metamodel and its construction is similar to Cokriging. It combines the advantages of the interpolation-based Kriging metamodel and the regression-based polynomial chaos expansions (PCE). In the work the PC-Cokriging model is compared to other metamodels namely PCE, Kriging, PC-Kriging and Cokriging. These metamodel are first compared in terms of global accuracy, measured by root mean squared error (RMSE) and normalized RMSE (NRMSE) for different sample sets, each with an increasing number of HF samples. These metamodels are then used to find the optimum. Once the optimum design is found computational fluid dynamics (CFD) simulations are rerun and the results are compared to each other. In this study a drag reduction of 73.1 counts was achieved. The multifidelity metamodels required 19 HF samples along with 1,055 low-fidelity to converge to the optimum drag value of 129 counts, while the single fidelity models required 155 HF samples to do the same
- …